40 research outputs found

    Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection

    Get PDF
    Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions

    Image-based Text Classification using 2D Convolutional Neural Networks

    Get PDF
    We propose a new approach to text classification in which we consider the input text as an image and apply 2D Convolutional Neural Networks to learn the local and global semantics of the sentences from the variations of the visual patterns of words. Our approach demonstrates that it is possible to get semantically meaningful features from images with text without using optical character recognition and sequential processing pipelines, techniques that traditional natural language processing algorithms require. To validate our approach, we present results for two applications: text classification and dialog modeling. Using a 2D Convolutional Neural Network, we were able to outperform the state-ofart accuracy results for a Chinese text classification task and achieved promising results for seven English text classification tasks. Furthermore, our approach outperformed the memory networks without match types when using out of vocabulary entities from Task 4 of the bAbI dialog dataset

    Comparing CNN and Human Crafted Features for Human Activity Recognition

    Get PDF
    Deep learning techniques such as Convolutional Neural Networks (CNNs) have shown good results in activity recognition. One of the advantages of using these methods resides in their ability to generate features automatically. This ability greatly simplifies the task of feature extraction that usually requires domain specific knowledge, especially when using big data where data driven approaches can lead to anti-patterns. Despite the advantage of this approach, very little work has been undertaken on analyzing the quality of extracted features, and more specifically on how model architecture and parameters affect the ability of those features to separate activity classes in the final feature space. This work focuses on identifying the optimal parameters for recognition of simple activities applying this approach on both signals from inertial and audio sensors. The paper provides the following contributions: (i) a comparison of automatically extracted CNN features with gold standard Human Crafted Features (HCF) is given, (ii) a comprehensive analysis on how architecture and model parameters affect separation of target classes in the feature space. Results are evaluated using publicly available datasets. In particular, we achieved a 93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with 3 convolutional layers and 32 kernel size, and a 90.5% F-Score on the DCASE 2017 development dataset, simplified for three classes (indoor, outdoor and vehicle), using 2D CNNs with 2 convolutional layers and a 2x2 kernel size

    Audio Content Analysis for Unobtrusive Event Detection in Smart Homes

    Get PDF
    Institute of Engineering Sciences The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Environmental sound signals are multi-source, heterogeneous, and varying in time. Many systems have been proposed to process such signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. This paper contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features and which classifiers are most suitable in the presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the signal-to-noise-ratio and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D Convolutional Neural Networks (CNN) using mel-spectrogram energies and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems. The first one, which uses a gradient boosting classifier, achieved an F1-Score of 90.2% and a recognition accuracy of 91.7%. The second one, which uses a 2D CNN with mel-spectrogram images, achieved an F1-Score of 92.7% and a recognition accuracy of 96%

    Audio-based Event Recognition System for Smart Homes

    Get PDF
    Building an acoustic-based event recognition system for smart homes is a challenging task due to the lack of high-level structures in environmental sounds. In particular, the selection of effective features is still an open problem. We make an important step toward this goal by showing that the combination of Mel-Frequency Cepstral Coefficients, Zero- Crossing Rate, and Discrete Wavelet Transform features can achieve an F1 score of 96.5% and a recognition accuracy of 97.8% with a gradient boosting classifier for ambient sounds recorded in a kitchen environment

    Energy-based decision engine for household human activity recognition

    Get PDF
    We propose a framework for energy-based human activity recognition in a household environment. We apply machine learning techniques to infer the state of household appliances from their energy consumption data and use rulebased scenarios that exploit these states to detect human activity. Our decision engine achieved a 99.1% accuracy for real-world data collected in the kitchens of two smart homes

    Bacchus Long‐Term (BLT) data set: Acquisition of the agricultural multimodal BLT data set with automated robot deployment

    Get PDF
    Achieving a robust long-term deployment with mobile robots in the agriculture domain is both a demanded and challenging task. The possibility to have autonomous platforms in the field performing repetitive tasks, such as monitoring or harvesting crops, collides with the difficulties posed by the always-changing appearance of the environment due to seasonality. With this scope in mind, we report an ongoing effort in the long-term deployment of an autonomous mobile robot in a vineyard, with the main objective of acquiring what we called the Bacchus Long-Term (BLT) Dataset. This dataset consists of multiple sessions recorded in the same area of a vineyard but at different points in time, covering a total of 7 months to capture the whole canopy growth from March until September. The multimodal dataset recorded is acquired with the main focus put on pushing the development and evaluations of different mapping and localisation algorithms for long-term autonomous robots operation in the agricultural domain. Hence, besides the dataset, we also present an initial study in long-term localisation using four different sessions belonging to four different months with different plant stages. We identify that state-of-the-art localisation methods can only cope partially with the amount of change in the environment, making the proposed dataset suitable to establish a benchmark on which the robotics community can test its methods. On our side, we anticipate two solutions pointed at extracting stable temporal features for improving long-term 4d localisation results. The BLT dataset is available at https://lncn.ac/lcas-blt}{lncn.ac/lcas-blt

    Human – Computer interaction through affective interfaces

    No full text
    The development of future systems, capable to understand human affective states and respond to emotional changes is the key aim of Affective Computing. In this context, Automatic Affect Recognition (AAR) is a major challenge, yet at the same time, a significantly complex problem. Solutions to this problem can lead to advances in human-computer interaction, and may as well lead to improvement of the quality of life, through for e.g. automatic stress detection systems. The present thesis focuses on human computer interaction, by the means of Affectictive Interfaces. It introduces for the first time in the field of AAR, biosignal (Galvanic Skin Response - GSR and Electrocardiogram - ECG) processing techniques that are based on the theory of orthogonal Legendre and Krawtchouk moments. Moreover, novel variations of these moments are proposed, with the aim to further enhance the effectiveness of AAR systems. Furthermore, it introduces novel subject-dependent features that are extracted from GSR and ECG biosignals, which are capable to reduce between-subject variability that typically appears in these biosignals. Finally, it introduces new human behavioural features that can be extracted automatically from computer systems, with the aim of automatic stress detection. Focusing on future practical applications of AAR, the proposed methods were experimentally evaluated over their effectiveness towards the automatic recognition of boredom, frustration and stress. For this purpose, experiments were designed and conducted, with the aim to collect data during the induction of the specific affective states. The experimental evaluation showed that the proposed methods have the capability to significantly enhance the field of AAR, towards the development of future practical systems. Finally, focusing on affective Graphical User Interface (GUI) design, this thesis presents the development of a GUI for a mobile device application. Following affective design principles, the specific GUI was designed with the main aim of pleasing the end user and reducing the possibilities for negative emotions (e.g. frustration) to be induced from the interaction with it.Βασική επιδίωξη του χώρου της συναισθηματικής υπολογιστικής (Affective Computing) είναι η ανάπτυξη ολοκληρωμένων μηχανών, ικανών αφενός να αναγνωρίζουν παραμέτρους της συναισθηματικής κατάστασης του χρήστη τους, και αφετέρου, να αναπροσαρμόζουν το πεδίο της αλληλεπίδρασης κατάλληλα. Στο πλαίσιο αυτό, η αυτόματη αναγνώριση ανθρώπινων συναισθημάτων από υπολογιστή (ΑΑΣ) αποτελεί σημαντική πρόκληση ως πολύπλοκο πρόβλημα, η επίλυση του οποίου δύναται να συμβάλει αποφασιστικά, τόσο στη βελτίωση της αλληλεπίδρασης ανθρώπου-υπολογιστή, όσο και στη βελτίωση της ποιότητας ζωής του ανθρώπου γενικότερα. Η παρούσα διατριβή εξετάζει το θέμα της αλληλεπίδρασης ανθρώπου-υπολογιστή με χρήση ευφυών διεπαφών αναγνώρισης συναισθημάτων. Στα πλαίσιά της, εισάγονται για πρώτη φορά στο χώρο της ΑΑΣ, μέθοδοι επεξεργασίας βιοσημάτων Ηλεκτροκαρδιογραφήματος και Γαλβανικής Αντίδρασης Δέρματος, που βασίζονται στη θεωρία των ορθογώνιων ροπών Legendre και Krawtchouk. Προτείνονται επίσης νέες τροποποιήσεις των εν λόγω ροπών, οι οποίες είναι σε θέση να δώσουν χαρακτηριστικά βιοσημάτων, ικανά για περαιτέρω ενίσχυση της ΑΑΣ. Ακόμη, εισάγονται νέα, εξατομικευμένα χαρακτηριστικά των παραπάνω βιοσημάτων, ικανά να αμβλύνουν τις διαφοροποιήσεις που τυπικά παρουσιάζονται στα βιοσήματα διαφορετικών ανθρώπων. Επιπρόσθετα, προτείνονται νέα χαρακτηριστικά συμπεριφοράς του ανθρώπου, τα οποία μπορούν να εξαχθούν αυτόματα από υπολογιστικά συστήματα, με στόχο την αυτόματη αναγνώριση στρες. Εστιάζοντας σε μελλοντικές πρακτικές εφαρμογές ΑΑΣ, οι προτεινόμενες μέθοδοι εξετάζονται πειραματικά ως προς την απόδοσή τους στην αυτόματη αναγνώριση συναισθηματικών καταστάσεων ανίας, εκνευρισμού και ψυχολογικού στρες. Για το σκοπό αυτό, σχεδιάστηκαν και εκτελέστηκαν στα πλαίσια της παρούσας έρευνας πειράματα, που είχαν σαν στόχο την καταγραφή των απαραίτητων δεδομένων, κατά την επαγωγή των παραπάνω συναισθημάτων. Τα πειραματικά αποτελέσματα που προέκυψαν, καταδεικνύουν ότι οι προτεινόμενες μέθοδοι έχουν τη δυνατότητα να ενισχύσουν σημαντικά το χώρο της ΑΑΣ, στην κατεύθυνση της ανάπτυξης μελλοντικών συστημάτων, ικανών να βρουν θέση στην καθημερινότητα του ανθρώπου. Τέλος, επιχειρώντας ένα ακόμη βήμα προς στην κατεύθυνση μελλοντικών, ολοκληρωμένων εφαρμογών συναισθηματικής υπολογιστικής, επιχειρείται στα πλαίσια της διατριβής η εφαρμογή αρχών του χώρου στη διαδικασία ανάπτυξης διεπαφών αλληλεπίδρασης ανθρώπου-υπολογιστή, με στόχο την ευχαρίστηση των χρηστών και την αποφυγή πρόκλησης αρνητικών συναισθημάτων κατά την αλληλεπίδραση
    corecore